{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Spark JDBC to Databases\n",
    "\n",
    "- [Overview](#spark-jdbc-overview)\n",
    "- [Setup](#spark-jdbc-setup)\n",
    "  - [Define Environment Variables](#spark-jdbc-define-envir-vars)\n",
    "  - [Initiate a Spark JDBC Session](#spark-jdbc-init-session)\n",
    "    - [Load Driver Packages Dynamically](#spark-jdbc-init-dynamic-pkg-load)\n",
    "    - [Load Driver Packages Locally](#spark-jdbc-init-local-pkg-load)\n",
    "- [Connect to Databases Using Spark JDBC](#spark-jdbc-connect-to-dbs)\n",
    " - [Connect to a MySQL Database](#spark-jdbc-to-mysql)\n",
    "   - [Connecting to a Public MySQL Instance](#spark-jdbc-to-mysql-public)\n",
    "   - [Connecting to a Test or Temporary MySQL Instance](#spark-jdbc-to-mysql-test-or-temp)\n",
    " - [Connect to a PostgreSQL Database](#spark-jdbc-to-postgresql)\n",
    " - [Connect to an Oracle Database](#spark-jdbc-to-oracle)\n",
    " - [Connect to an MS SQL Server Database](#spark-jdbc-to-ms-sql-server)\n",
    " - [Connect to a Redshift Database](#spark-jdbc-to-redshift)\n",
    "- [Cleanup](#spark-jdbc-cleanup)\n",
    "  - [Delete Data](#spark-jdbc-delete-data)\n",
    "  - [Release Spark Resources](#spark-jdbc-release-spark-resources)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"spark-jdbc-overview\"></a>\n",
    "## Overview\n",
    "\n",
    "Spark SQL includes a data source that can read data from other databases using Java database connectivity (**JDBC**).\n",
    "The results are returned as a Spark DataFrame that can easily be processed in Spark SQL or joined with other data sources.\n",
    "For more information, see the [Spark documentation](https://spark.apache.org/docs/2.3.1/sql-programming-guide.html#jdbc-to-other-databases)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"spark-jdbc-setup\"></a>\n",
    "## Setup"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"spark-jdbc-define-envir-vars\"></a>\n",
    "### Define Environment Variables\n",
    "\n",
    "Begin by initializing some environment variables.\n",
    "\n",
    "> **Note:** You need to edit the following code to assign valid values to the database variables (`DB_XXX`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "env: DB_HOST=\"\"        # Database host as a fully qualified name (FQN)\n",
      "env: DB_PORT=\"\"        # Database port number\n",
      "env: DB_DRIVER=\"\"      # Database driver [mysql/postgresql|oracle:thin|sqlserver]\n",
      "env: DB_Name=\"\"        # Database|schema name\n",
      "env: DB_TABLE=\"\"       # Table name\n",
      "env: DB_USER=\"\"        # Database username\n",
      "env: DB_PASSWORD=\"\"    # Database user password\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "\n",
    "# Read Iguazio Data Science Platform (\"the platform\") environment variables into local variables\n",
    "V3IO_USER = os.getenv('V3IO_USERNAME')\n",
    "V3IO_HOME = os.getenv('V3IO_HOME')\n",
    "V3IO_HOME_URL = os.getenv('V3IO_HOME_URL')\n",
    "\n",
    "# Define database environment variables\n",
    "# TODO: Edit the variable definitions to assign valid values for your environment.\n",
    "%env DB_HOST = \"\"        # Database host as a fully qualified name (FQN)\n",
    "%env DB_PORT = \"\"        # Database port number\n",
    "%env DB_DRIVER = \"\"      # Database driver [mysql/postgresql|oracle:thin|sqlserver]\n",
    "%env DB_Name = \"\"        # Database|schema name\n",
    "%env DB_TABLE = \"\"       # Table name\n",
    "%env DB_USER = \"\"        # Database username\n",
    "%env DB_PASSWORD = \"\"    # Database user password\n",
    "\n",
    "os.environ[\"PYSPARK_SUBMIT_ARGS\"] = \"--packages mysql:mysql-connector-java:5.1.39 pyspark-shell\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"spark-jdbc-init-session\"></a>\n",
    "### Initiate a Spark JDBC Session\n",
    "\n",
    "You can select between two methods for initiating a Spark session with JDBC drivers (\"Spark JDBC session\"):\n",
    "\n",
    "- [Load Driver Packages Dynamically](#spark-jdbc-init-dynamic-pkg-load) (preferred)\n",
    "- [Load Driver Packages Locally](#spark-jdbc-init-local-pkg-load)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"spark-jdbc-init-dynamic-pkg-load\"></a>\n",
    "#### Load Driver Packages Dynamically\n",
    "\n",
    "The preferred method for initiating a Spark JDBC session is to load the required JDBC driver packages dynamically from https://spark-packages.org/ by doing the following:\n",
    "\n",
    "1. Set the `PYSPARK_SUBMIT_ARGS` environment variable to `\"--packages <group>:<name>:<version> pyspark-shell\"`.\n",
    "2. Initiate a new spark session.\n",
    "\n",
    "The following example demonstrates how to initiate a Spark session that uses version 5.1.39 of the **mysql-connector-java** MySQL JDBC database driver (`mysql:mysql-connector-java:5.1.39`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pyspark.conf import SparkConf\n",
    "from pyspark.sql import SparkSession\n",
    "\n",
    "# Configure the Spark JDBC driver package\n",
    "# TODO: Replace `mysql:mysql-connector-java:5.1.39` with the required driver-pacakge information.\n",
    "os.environ[\"PYSPARK_SUBMIT_ARGS\"] = \"--packages mysql:mysql-connector-java:5.1.39 pyspark-shell\"\n",
    "\n",
    "# Initiate a new Spark session; you can change the application name\n",
    "spark = SparkSession.builder.appName(\"Spark JDBC tutorial\").getOrCreate()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"spark-jdbc-init-local-pkg-load\"></a>\n",
    "#### Load Driver Packages Locally\n",
    "\n",
    "You can also load the Spark JDBC driver package from the local file system of your Iguazio Data Science Platform (\"the platform\").\n",
    "It's recommended that you use this method only if you don't have internet connection (\"dark-site installations\") or if there's no official Spark package for your database.\n",
    "The platform comes pre-deployed with MySQL, PostgreSQL, Oracle, Redshift, and MS SQL Server JDBC driver packages, which are found in the **/spark/3rd_party** directory (**$SPARK_HOME/3rd_party**).\n",
    "You can also copy additional driver packages or different versions of the pre-deployed drivers to the platform &mdash; for example, from the **Data** dashboard page.\n",
    "\n",
    "To load a JDBC driver package locally, you need to set the `spark.driver.extraClassPath` and `spark.executor.extraClassPath` Spark configuration properties to the path to a Spark JDBC driver package in the platform's file system.\n",
    "You can do this using either of the following alternative methods:\n",
    "\n",
    "- Preconfigure the path to the driver package &mdash;\n",
    "\n",
    "  1. In your Spark-configuration file &mdash; **$SPARK_HOME/conf/spark-defaults.conf** &mdash; set the `extraClassPath` configuration properties to the path to the relevant driver package:\n",
    "    ```python\n",
    "    spark.driver.extraClassPath = \"<path to a JDBC driver package>\"\n",
    "    spark.executor.extraClassPath = \"<path to a JDBC driver package>\"\n",
    "    ```\n",
    "  2. Initiate a new spark session.\n",
    "\n",
    "- Configure the path to the driver package as part of the initiation of a new Spark session:\n",
    "  ```python\n",
    "  spark = SparkSession.builder. \\\n",
    "    appName(\"<app name>\"). \\\n",
    "    config(\"spark.driver.extraClassPath\", \"<path to a JDBC driver package>\"). \\\n",
    "    config(\"spark.executor.extraClassPath\", \"<path to a JDBC driver package>\"). \\\n",
    "    getOrCreate()\n",
    "  ```\n",
    "\n",
    "The following example demonstrates how to initiate a Spark session that uses the pre-deployed version 8.0.13 of the **mysql-connector-java** MySQL JDBC database driver (**/spark/3rd_party/mysql-connector-java-8.0.13.jar**)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from pyspark.conf import SparkConf\n",
    "from pyspark.sql import SparkSession\n",
    "\n",
    "# METHOD I\n",
    "# Edit your Spark configuration file ($SPARK_HOME/conf/spark-defaults.conf), set the `spark.driver.extraClassPath` and\n",
    "# `spark.executor.extraClassPath` properties to the local file-system path to a pre-deployed Spark JDBC driver package.\n",
    "# Replace \"/spark/3rd_party/mysql-connector-java-8.0.13.jar\" with the relevant path.\n",
    "#     spark.driver.extraClassPath = \"/spark/3rd_party/mysql-connector-java-8.0.13.jar\"\n",
    "#     spark.executor.extraClassPath = \"/spark/3rd_party/mysql-connector-java-8.0.13.jar\"\n",
    "#\n",
    "# Then, initiate a new Spark session; you can change the application name.\n",
    "# spark = SparkSession.builder.appName(\"Spark JDBC tutorial\").getOrCreate()\n",
    "\n",
    "# METHOD II\n",
    "# Initiate a new Spark Session; you can change the application name.\n",
    "# Set the same `extraClassPath` configuration properties as in Method #1 as part of the initiation command.\n",
    "# Replace \"/spark/3rd_party/mysql-connector-java-8.0.13.jar\" with the relevant path.\n",
    "# local file-system path to a pre-deployed Spark JDBC driver package\n",
    "spark = SparkSession.builder. \\\n",
    "    appName(\"Spark JDBC tutorial\"). \\\n",
    "    config(\"spark.driver.extraClassPath\", \"/spark/3rd_party/mysql-connector-java-8.0.13.jar\"). \\\n",
    "    config(\"spark.executor.extraClassPath\", \"/spark/3rd_party/mysql-connector-java-8.0.13.jar\"). \\\n",
    "    getOrCreate()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[('spark.driver.host', 'jupyter-79b887f4f5-p54md'),\n",
      " ('spark.driver.extraLibraryPath', '/hadoop/etc/hadoop'),\n",
      " ('spark.app.id', 'local-1583935133790'),\n",
      " ('spark.executor.id', 'driver'),\n",
      " ('spark.files',\n",
      "  'file:///igz/java/libs/v3io-py.zip,file:///igz/.ivy2/jars/mysql_mysql-connector-java-5.1.39.jar'),\n",
      " ('spark.jars',\n",
      "  'file:///spark/v3io-libs/v3io-hcfs_2.11.jar,file:///spark/v3io-libs/v3io-spark2-object-dataframe_2.11.jar,file:///spark/v3io-libs/v3io-spark2-streaming_2.11.jar,file:///igz/.ivy2/jars/mysql_mysql-connector-java-5.1.39.jar'),\n",
      " ('spark.app.name', 'Spark JDBC tutorial'),\n",
      " ('spark.executor.extraJavaOptions', '\"-Dsun.zip.disableMemoryMapping=true\"'),\n",
      " ('spark.driver.extraClassPath',\n",
      "  '/spark/3rd_party/mysql-connector-java-8.0.13.jar'),\n",
      " ('spark.submit.pyFiles',\n",
      "  '/igz/java/libs/v3io-py.zip,/igz/.ivy2/jars/mysql_mysql-connector-java-5.1.39.jar'),\n",
      " ('spark.executor.extraLibraryPath', '/hadoop/etc/hadoop'),\n",
      " ('spark.driver.port', '38207'),\n",
      " ('spark.repl.local.jars',\n",
      "  'file:///spark/v3io-libs/v3io-hcfs_2.11.jar,file:///spark/v3io-libs/v3io-spark2-object-dataframe_2.11.jar,file:///spark/v3io-libs/v3io-spark2-streaming_2.11.jar,file:///igz/.ivy2/jars/mysql_mysql-connector-java-5.1.39.jar'),\n",
      " ('spark.rdd.compress', 'True'),\n",
      " ('spark.serializer.objectStreamReset', '100'),\n",
      " ('spark.master', 'local[*]'),\n",
      " ('spark.submit.deployMode', 'client'),\n",
      " ('spark.executor.extraClassPath',\n",
      "  '/spark/3rd_party/mysql-connector-java-8.0.13.jar'),\n",
      " ('spark.driver.extraJavaOptions', '\"-Dsun.zip.disableMemoryMapping=true\"'),\n",
      " ('spark.ui.showConsoleProgress', 'true')]\n"
     ]
    }
   ],
   "source": [
    "import pprint\n",
    "\n",
    "# Verify your configuration: run the following code to list the current Spark configurations, and check the output to verify that the\n",
    "# `spark.driver.extraClassPath` and `spark.executor.extraClassPath` properties are set to the correct local driver-pacakge path.\n",
    "conf = spark.sparkContext._conf.getAll()\n",
    "\n",
    "pprint.pprint(conf)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"spark-jdbc-connect-to-dbs\"></a>\n",
    "## Connect to Databases Using Spark JDBC"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"spark-jdbc-to-mysql\"></a>\n",
    "### Connect to a MySQL Database\n",
    "\n",
    "- [Connecting to a Public MySQL Instance](#spark-jdbc-to-mysql-public)\n",
    "- [Connecting to a Test or Temporary MySQL Instance](#spark-jdbc-to-mysql-test-or-temp)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"spark-jdbc-to-mysql-public\"></a>\n",
    "#### Connect to a Public MySQL Instance"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "+--------+-------------+---------+--------------------+--------------------+--------------------+----------------+--------------+------------+--------------------+--------------------+------------------+--------------------+--------------------+--------+--------+--------------+----------+--------------------+--------------------+-----------------+--------------------+---------------+--------+------------+---------+------------+--------------+----+----+---------------+-------+----------+-------------------+-------------------+\n",
      "|rfam_acc|      rfam_id|auto_wiki|         description|              author|         seed_source|gathering_cutoff|trusted_cutoff|noise_cutoff|             comment|         previous_id|           cmbuild|         cmcalibrate|            cmsearch|num_seed|num_full|num_genome_seq|num_refseq|                type|    structure_source|number_of_species|number_3d_structures|num_pseudonokts|tax_seed|ecmli_lambda| ecmli_mu|ecmli_cal_db|ecmli_cal_hits|maxl|clen|match_pair_node|hmm_tau|hmm_lambda|            created|            updated|\n",
      "+--------+-------------+---------+--------------------+--------------------+--------------------+----------------+--------------+------------+--------------------+--------------------+------------------+--------------------+--------------------+--------+--------+--------------+----------+--------------------+--------------------+-----------------+--------------------+---------------+--------+------------+---------+------------+--------------+----+----+---------------+-------+----------+-------------------+-------------------+\n",
      "| RF00001|      5S_rRNA|     1302|    5S ribosomal RNA|Griffiths-Jones S...|Szymanski et al, ...|            38.0|          38.0|        37.9|5S ribosomal RNA ...|                null|cmbuild -F CM SEED|cmcalibrate --mpi CM|cmsearch --cpu 4 ...|     712|  139932|             0|         0|         Gene; rRNA;|Published; PMID:1...|             8022|                   0|           null|        |     0.59496| -5.32219|     1600000|        213632| 305| 119|           true|-3.7812|   0.71822|2013-10-03 20:41:44|2019-01-04 15:01:52|\n",
      "| RF00002|    5_8S_rRNA|     1303|  5.8S ribosomal RNA|Griffiths-Jones S...|Wuyts et al, Euro...|            42.0|          42.0|        41.9|5.8S ribosomal RN...|                null|cmbuild -F CM SEED|cmcalibrate --mpi CM|cmsearch --cpu 4 ...|      61|    4716|             0|         0|         Gene; rRNA;|Published; PMID:1...|              587|                   0|           null|        |     0.65546| -9.33442|     1600000|        410495| 277| 154|           true|-3.5135|   0.71791|2013-10-03 20:47:00|2019-01-04 15:01:52|\n",
      "| RF00003|           U1|     1304| U1 spliceosomal RNA|Griffiths-Jones S...|Zwieb C, The uRNA...|            40.0|          40.0|        39.9|U1 is a small nuc...|                null|cmbuild -F CM SEED|cmcalibrate --mpi CM|cmsearch --cpu 4 ...|     100|   15436|             0|         0|Gene; snRNA; spli...|Published; PMID:2...|              837|                   0|           null|        |      0.6869| -8.54663|     1600000|        421575| 267| 166|           true|-3.7781|   0.71616|2013-10-03 20:57:11|2019-01-04 15:01:52|\n",
      "| RF00004|           U2|     1305| U2 spliceosomal RNA|Griffiths-Jones S...|The uRNA database...|            46.0|          46.0|        45.9|U2 is a small nuc...|                null|cmbuild -F CM SEED|cmcalibrate --mpi CM|cmsearch --cpu 4 ...|     208|   16562|             0|         0|Gene; snRNA; spli...|Published; PMID:2...|             1102|                   0|           null|        |     0.55222| -9.81359|     1600000|        403693| 301| 193|           true|-3.5144|   0.71292|2013-10-03 20:58:30|2019-01-04 15:01:52|\n",
      "| RF00005|         tRNA|     1306|                tRNA|Eddy SR, Griffith...|             Eddy SR|            29.0|          29.0|        28.9|Transfer RNA (tRN...|                null|cmbuild -F CM SEED|cmcalibrate --mpi CM|cmsearch --cpu 4 ...|     954| 1429129|             0|         0|         Gene; tRNA;|Published; PMID:8...|             9934|                   0|           null|        |     0.64375| -4.21424|     1600000|        283681| 253|  71|           true|-2.6167|   0.73401|2013-10-03 21:00:26|2019-01-04 15:01:52|\n",
      "| RF00006|        Vault|     1307|           Vault RNA|Bateman A, Gardne...|Published; PMID:1...|            34.0|          34.1|        33.9|This family of RN...|                null|cmbuild -F CM SEED|cmcalibrate --mpi CM|cmsearch --cpu 4 ...|      73|     564|             0|         0|               Gene;|Published; PMID:1...|               94|                   0|           null|        |     0.63669|  -4.8243|     1600000|        279629| 406| 101|           true|-3.5531|   0.71855|2013-10-03 22:04:04|2019-01-04 15:01:52|\n",
      "| RF00007|          U12|     1308|U12 minor spliceo...|Griffiths-Jones S...|Shukla GC and Pad...|            53.0|          53.0|        52.9|The U12 small nuc...|                null|cmbuild -F CM SEED|cmcalibrate --mpi CM|cmsearch --cpu 4 ...|      62|     531|             0|         0|Gene; snRNA; spli...|Predicted; Griffi...|              336|                   0|           null|        |     0.55844| -9.95163|     1600000|        493455| 520| 155|           true|-3.1678|   0.71782|2013-10-03 22:04:07|2019-01-04 15:01:52|\n",
      "| RF00008| Hammerhead_3|     1309|Hammerhead ribozy...|           Bateman A|           Bateman A|            29.0|          29.0|        28.9|The hammerhead ri...|          Hammerhead|cmbuild -F CM SEED|cmcalibrate --mpi CM|cmsearch --cpu 4 ...|      82|    3098|             0|         0|     Gene; ribozyme;|Published; PMID:7...|              176|                   0|           null|        |     0.63206| -3.83325|     1600000|        199872| 394|  58|           true| -4.375|   0.71923|2013-10-03 22:04:11|2019-01-04 15:01:52|\n",
      "| RF00009|   RNaseP_nuc|     1310|     Nuclear RNase P|Griffiths-Jones S...|Brown JW, The Rib...|            28.0|          28.0|        27.9|Ribonuclease P (R...|                null|cmbuild -F CM SEED|cmcalibrate --mpi CM|cmsearch --cpu 4 ...|     116|    1237|             0|         0|     Gene; ribozyme;|Published; PMID:9...|              763|                   0|           null|        |      0.7641| -8.04053|     1600000|        274636|1082| 303|           true|-4.3673|   0.70576|2013-10-03 22:04:14|2019-01-04 15:01:52|\n",
      "| RF00010|RNaseP_bact_a|     2441|Bacterial RNase P...|Griffiths-Jones S...|Brown JW, The Rib...|           100.0|         100.5|        99.6|Ribonuclease P (R...|                null|cmbuild -F CM SEED|cmcalibrate --mpi CM|cmsearch --cpu 4 ...|     458|    6023|             0|         0|     Gene; ribozyme;|Published; PMID:9...|             6324|                   0|           null|        |     0.76804| -8.48988|     1600000|        366265| 873| 367|           true|-4.3726|   0.70355|2013-10-03 22:04:21|2019-01-04 15:01:52|\n",
      "| RF00011|RNaseP_bact_b|     2441|Bacterial RNase P...|Griffiths-Jones S...|Brown JW, The Rib...|            97.0|          97.1|        96.6|Ribonuclease P (R...|                null|cmbuild -F CM SEED|cmcalibrate --mpi CM|cmsearch --cpu 4 ...|     114|     676|             0|         0|     Gene; ribozyme;|Published; PMID:9...|              767|                   0|           null|        |     0.69906|  -8.4903|     1600000|        418092| 675| 366|           true|-4.0357|   0.70361|2013-10-03 22:04:51|2019-01-04 15:01:52|\n",
      "| RF00012|           U3|     1312|Small nucleolar R...|  Gardner PP, Marz M|Published; PMID:1...|            34.0|          34.0|        33.9|Small nucleolar R...|                null|cmbuild -F CM SEED|cmcalibrate --mpi CM|cmsearch --cpu 4 ...|      87|    3924|             0|         0|Gene; snRNA; snoR...|Published; PMID:1...|              416|                   0|           null|        |     0.59795| -9.77278|     1600000|        400072| 326| 218|           true|-3.8301|   0.71077|2013-10-03 22:04:54|2019-01-04 15:01:52|\n",
      "| RF00013|           6S|     2461|       6S / SsrS RNA|Bateman A, Barric...|          Barrick JE|            48.0|          48.0|        47.9|E. coli 6S RNA wa...|                null|cmbuild -F CM SEED|cmcalibrate --mpi CM|cmsearch --cpu 4 ...|     149|    3576|             0|         0|               Gene;|Published; PMID:1...|             3309|                   0|           null|        |     0.56243|-10.04259|     1600000|        331091| 277| 188|           true|-3.5895|   0.71351|2013-10-03 22:05:06|2019-01-04 15:01:52|\n",
      "| RF00014|         DsrA|     1237|            DsrA RNA|           Bateman A|           Bateman A|            60.0|          61.5|        57.6|DsrA RNA regulate...|                null|cmbuild -F CM SEED|cmcalibrate --mpi CM|cmsearch --cpu 4 ...|       5|      35|             0|         0|         Gene; sRNA;|Published; PMID:9...|               39|                   0|           null|        |     0.53383| -8.38474|     1600000|        350673| 177|  85|           true|-3.3562|   0.71888|2013-02-01 11:56:19|2019-01-04 15:01:52|\n",
      "| RF00015|           U4|     1314| U4 spliceosomal RNA|  Griffiths-Jones SR|Zwieb C, The uRNA...|            46.0|          46.0|        45.9|U4 small nuclear ...|                null|cmbuild -F CM SEED|cmcalibrate --mpi CM|cmsearch --cpu 4 ...|     170|    7522|             0|         0|Gene; snRNA; spli...|Published; PMID:2...|             1025|                   0|           null|        |     0.58145| -8.85604|     1600000|        407516| 575| 140|           true|-3.5007|   0.71795|2013-10-03 22:05:22|2019-01-04 15:01:52|\n",
      "| RF00016|      SNORD14|     1242|Small nucleolar R...|Griffiths-Jones S...|  Griffiths-Jones SR|            64.0|          64.1|        63.9|U14 small nucleol...|                 U14|cmbuild -F CM SEED|cmcalibrate --mpi CM|cmsearch --cpu 4 ...|      18|    1182|             0|         0|Gene; snRNA; snoR...|    Predicted; PFOLD|              221|                   0|           null|        |     0.63073| -3.65386|     1600000|        232910| 229| 116|           true| -3.128|   0.71819|2013-02-01 11:56:23|2019-01-04 15:01:52|\n",
      "| RF00017|  Metazoa_SRP|     1315|Metazoan signal r...|          Gardner PP|Published; PMID:1...|            70.0|          70.0|        69.9|The signal recogn...|SRP_euk_arch; 7SL...|cmbuild -F CM SEED|cmcalibrate --mpi CM|cmsearch --cpu 4 ...|      91|   42386|             0|         0|               Gene;|Published; PMID:1...|              402|                   0|           null|        |     0.64536| -9.85267|     1600000|        488632| 514| 301|           true|-4.0177|   0.70604|2013-10-03 22:07:53|2019-01-04 15:01:52|\n",
      "| RF00018|         CsrB|     2460|CsrB/RsmB RNA family|Bateman A, Gardne...|           Bateman A|            71.0|          71.4|        70.9|The CsrB RNA bind...|                null|cmbuild -F CM SEED|cmcalibrate --mpi CM|cmsearch --cpu 4 ...|      38|     254|             0|         0|         Gene; sRNA;|Predicted; PFOLD;...|              196|                   0|           null|        |     0.69326| -9.81172|     1600000|        546392| 555| 356|           true|-4.0652|   0.70388|2013-10-03 23:07:27|2019-01-04 15:01:52|\n",
      "| RF00019|        Y_RNA|     1317|               Y RNA|Griffiths-Jones S...|Griffiths-Jones S...|            38.0|          38.0|        37.9|Y RNAs are compon...|      Y1; Y2; Y3; Y5|cmbuild -F CM SEED|cmcalibrate --mpi CM|cmsearch --cpu 4 ...|     104|    8521|             0|         0|               Gene;|Published; PMID:1...|              123|                   0|           null|        |     0.59183| -5.14312|     1600000|        189478| 249|  98|           true|-2.8418|    0.7187|2013-10-03 23:07:38|2019-01-04 15:01:52|\n",
      "| RF00020|           U5|     1318| U5 spliceosomal RNA|Griffiths-Jones S...|Zwieb C, The uRNA...|            40.0|          40.0|        39.9|U5 RNA is a compo...|                null|cmbuild -F CM SEED|cmcalibrate --mpi CM|cmsearch --cpu 4 ...|     180|    7524|             0|         0|Gene; snRNA; spli...|Published; PMID:2...|             1001|                   0|           null|        |     0.50732| -5.54774|     1600000|        339349| 331| 116|           true|-4.1327|    0.7182|2013-10-03 23:08:43|2019-01-04 15:01:52|\n",
      "+--------+-------------+---------+--------------------+--------------------+--------------------+----------------+--------------+------------+--------------------+--------------------+------------------+--------------------+--------------------+--------+--------+--------------+----------+--------------------+--------------------+-----------------+--------------------+---------------+--------+------------+---------+------------+--------------+----+----+---------------+-------+----------+-------------------+-------------------+\n",
      "only showing top 20 rows\n",
      "\n"
     ]
    }
   ],
   "source": [
    "#Loading data from a JDBC source\n",
    "dfMySQL = spark.read \\\n",
    "    .format(\"jdbc\") \\\n",
    "    .option(\"url\", \"jdbc:mysql://mysql-rfam-public.ebi.ac.uk:4497/Rfam\") \\\n",
    "    .option(\"dbtable\", \"Rfam.family\") \\\n",
    "    .option(\"user\", \"rfamro\") \\\n",
    "    .option(\"password\", \"\") \\\n",
    "    .option(\"driver\", \"com.mysql.jdbc.Driver\") \\\n",
    "    .load()\n",
    "\n",
    "dfMySQL.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"spark-jdbc-to-mysql-test-or-temp\"></a>\n",
    "#### Connect to a Test or Temporary MySQL Instance\n",
    "\n",
    "> **Note:** The following code won't work if the MySQL instance has been shut down."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true,
    "jupyter": {
     "outputs_hidden": true
    }
   },
   "outputs": [
    {
     "ename": "KeyboardInterrupt",
     "evalue": "",
     "output_type": "error",
     "traceback": [
      "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[0;31mKeyboardInterrupt\u001b[0m                         Traceback (most recent call last)",
      "\u001b[0;32m<ipython-input-8-9522022e74e2>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[1;32m      5\u001b[0m     \u001b[0;34m.\u001b[0m\u001b[0moption\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"user\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"root\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;31m \u001b[0m\u001b[0;31m\\\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      6\u001b[0m     \u001b[0;34m.\u001b[0m\u001b[0moption\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"password\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"my-secret-pw\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;31m \u001b[0m\u001b[0;31m\\\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 7\u001b[0;31m     \u001b[0;34m.\u001b[0m\u001b[0moption\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"driver\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m\"com.mysql.jdbc.Driver\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;31m \u001b[0m\u001b[0;31m\\\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      8\u001b[0m     \u001b[0;34m.\u001b[0m\u001b[0mload\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      9\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m/spark/python/pyspark/sql/readwriter.py\u001b[0m in \u001b[0;36mload\u001b[0;34m(self, path, format, schema, **options)\u001b[0m\n\u001b[1;32m    170\u001b[0m             \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_df\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_jreader\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mload\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_spark\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_sc\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_jvm\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mPythonUtils\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mtoSeq\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    171\u001b[0m         \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 172\u001b[0;31m             \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_df\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_jreader\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mload\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    173\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    174\u001b[0m     \u001b[0;34m@\u001b[0m\u001b[0msince\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m1.4\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py\u001b[0m in \u001b[0;36m__call__\u001b[0;34m(self, *args)\u001b[0m\n\u001b[1;32m   1253\u001b[0m             \u001b[0mproto\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mEND_COMMAND_PART\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   1254\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1255\u001b[0;31m         \u001b[0manswer\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgateway_client\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msend_command\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcommand\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m   1256\u001b[0m         return_value = get_return_value(\n\u001b[1;32m   1257\u001b[0m             answer, self.gateway_client, self.target_id, self.name)\n",
      "\u001b[0;32m/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py\u001b[0m in \u001b[0;36msend_command\u001b[0;34m(self, command, retry, binary)\u001b[0m\n\u001b[1;32m    983\u001b[0m         \u001b[0mconnection\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_get_connection\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    984\u001b[0m         \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 985\u001b[0;31m             \u001b[0mresponse\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mconnection\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msend_command\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mcommand\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    986\u001b[0m             \u001b[0;32mif\u001b[0m \u001b[0mbinary\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    987\u001b[0m                 \u001b[0;32mreturn\u001b[0m \u001b[0mresponse\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_create_connection_guard\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mconnection\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py\u001b[0m in \u001b[0;36msend_command\u001b[0;34m(self, command)\u001b[0m\n\u001b[1;32m   1150\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   1151\u001b[0m         \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 1152\u001b[0;31m             \u001b[0manswer\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msmart_decode\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstream\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mreadline\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m-\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m   1153\u001b[0m             \u001b[0mlogger\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mdebug\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Answer received: {0}\"\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0manswer\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m   1154\u001b[0m             \u001b[0;32mif\u001b[0m \u001b[0manswer\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstartswith\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mproto\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mRETURN_MESSAGE\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;32m/conda/lib/python3.6/socket.py\u001b[0m in \u001b[0;36mreadinto\u001b[0;34m(self, b)\u001b[0m\n\u001b[1;32m    584\u001b[0m         \u001b[0;32mwhile\u001b[0m \u001b[0;32mTrue\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    585\u001b[0m             \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 586\u001b[0;31m                 \u001b[0;32mreturn\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_sock\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mrecv_into\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mb\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m    587\u001b[0m             \u001b[0;32mexcept\u001b[0m \u001b[0mtimeout\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m    588\u001b[0m                 \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_timeout_occurred\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;32mTrue\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
      "\u001b[0;31mKeyboardInterrupt\u001b[0m: "
     ]
    }
   ],
   "source": [
    "dfMySQL = spark.read \\\n",
    "    .format(\"jdbc\") \\\n",
    "    .option(\"url\", \"jdbc:mysql://172.31.33.215:3306/db1\") \\\n",
    "    .option(\"dbtable\", \"db1.fruit\") \\\n",
    "    .option(\"user\", \"root\") \\\n",
    "    .option(\"password\", \"my-secret-pw\") \\\n",
    "    .option(\"driver\", \"com.mysql.jdbc.Driver\") \\\n",
    "    .load()\n",
    "\n",
    "dfMySQL.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"spark-jdbc-to-postgresql\"></a>\n",
    "### Connect to a PostgreSQL Database"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load data from a JDBC source\n",
    "dfPS = spark.read \\\n",
    "    .format(\"jdbc\") \\\n",
    "    .option(\"url\", \"jdbc:postgresql:dbserver\") \\\n",
    "    .option(\"dbtable\", \"schema.tablename\") \\\n",
    "    .option(\"user\", \"username\") \\\n",
    "    .option(\"password\", \"password\") \\\n",
    "    .load()\n",
    "\n",
    "dfPS2 = spark.read \\\n",
    "    .jdbc(\"jdbc:postgresql:dbserver\", \"schema.tablename\",\n",
    "    properties={\"user\": \"username\", \"password\": \"password\"})\n",
    "\n",
    "# Specify DataFrame column data types on read\n",
    "dfPS3 = spark.read \\\n",
    "    .format(\"jdbc\") \\\n",
    "    .option(\"url\", \"jdbc:postgresql:dbserver\") \\\n",
    "    .option(\"dbtable\", \"schema.tablename\") \\\n",
    "    .option(\"user\", \"username\") \\\n",
    "    .option(\"password\", \"password\") \\\n",
    "    .option(\"customSchema\", \"id DECIMAL(38, 0), name STRING\") \\\n",
    "    .load()\n",
    "\n",
    "# Save data to a JDBC source\n",
    "dfPS.write \\\n",
    "    .format(\"jdbc\") \\\n",
    "    .option(\"url\", \"jdbc:postgresql:dbserver\") \\\n",
    "    .option(\"dbtable\", \"schema.tablename\") \\\n",
    "    .option(\"user\", \"username\") \\\n",
    "    .option(\"password\", \"password\") \\\n",
    "    .save()\n",
    "\n",
    "dfPS2.write \\\n",
    "    properties={\"user\": \"username\", \"password\": \"password\"})\n",
    "\n",
    "# Specify create table column data types on write\n",
    "dfPS.write \\\n",
    "    .option(\"createTableColumnTypes\", \"name CHAR(64), comments VARCHAR(1024)\") \\\n",
    "    .jdbc(\"jdbc:postgresql:dbserver\", \"schema.tablename\", properties={\"user\": \"username\", \"password\": \"password\"})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"spark-jdbc-to-oracle\"></a>\n",
    "### Connect to an Oracle Database"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Read a table from Oracle (table: hr.emp)\n",
    "dfORA = spark.read \\\n",
    "    .format(\"jdbc\") \\\n",
    "    .option(\"url\", \"jdbc:oracle:thin:username/password@//hostname:portnumber/SID\") \\\n",
    "    .option(\"dbtable\", \"hr.emp\") \\\n",
    "    .option(\"user\", \"db_user_name\") \\\n",
    "    .option(\"password\", \"password\") \\\n",
    "    .option(\"driver\", \"oracle.jdbc.driver.OracleDriver\") \\\n",
    "    .load()\n",
    "\n",
    "dfORA.printSchema()\n",
    "\n",
    "dfORA.show()\n",
    "\n",
    "# Read a query from Oracle\n",
    "query = \"(select empno,ename,dname from emp, dept where emp.deptno = dept.deptno) emp\"\n",
    "\n",
    "dfORA1 = spark.read \\\n",
    "    .format(\"jdbc\") \\\n",
    "    .option(\"url\", \"jdbc:oracle:thin:username/password@//hostname:portnumber/SID\") \\\n",
    "    .option(\"dbtable\", query) \\\n",
    "    .option(\"user\", \"db_user_name\") \\\n",
    "    .option(\"password\", \"password\") \\\n",
    "    .option(\"driver\", \"oracle.jdbc.driver.OracleDriver\") \\\n",
    "    .load()\n",
    "\n",
    "dfORA1.printSchema()\n",
    "\n",
    "dfORA1.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"spark-jdbc-to-ms-sql-server\"></a>\n",
    "### Connect to an MS SQL Server Database"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Read a table from MS SQL Server\n",
    "dfMS = spark.read \\\n",
    "    .format(\"jdbc\") \\\n",
    "    .options(url=\"jdbc:sqlserver:username/password@//hostname:portnumber/DB\") \\\n",
    "    .option(\"dbtable\", \"db_table_name\") \\\n",
    "    .option(\"user\", \"db_user_name\") \\\n",
    "    .option(\"password\", \"password\") \\\n",
    "    .option(\"driver\", \"com.microsoft.sqlserver.jdbc.SQLServerDriver\" ) \\\n",
    "    .load()\n",
    "\n",
    "dfMS.printSchema()\n",
    "\n",
    "dfMS.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"spark-jdbc-to-redshift\"></a>\n",
    "### Connect to a Redshift Database"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Read data from a table\n",
    "dfRS = spark.read \\\n",
    "    .format(\"com.databricks.spark.redshift\") \\\n",
    "    .option(\"url\", \"jdbc:redshift://redshifthost:5439/database?user=username&password=pass\") \\\n",
    "    .option(\"dbtable\", \"my_table\") \\\n",
    "    .option(\"tempdir\", \"s3n://path/for/temp/data\") \\\n",
    "    .load()\n",
    "\n",
    "# Read data from a query\n",
    "dfRS = spark.read \\\n",
    "    .format(\"com.databricks.spark.redshift\") \\\n",
    "    .option(\"url\", \"jdbc:redshift://redshifthost:5439/database?user=username&password=pass\") \\\n",
    "    .option(\"query\", \"select x, count(*) my_table group by x\") \\\n",
    "    .option(\"tempdir\", \"s3n://path/for/temp/data\") \\\n",
    "    .load()\n",
    "\n",
    "# Write data back to a table\n",
    "dfRS.write \\\n",
    "  .format(\"com.databricks.spark.redshift\") \\\n",
    "  .option(\"url\", \"jdbc:redshift://redshifthost:5439/database?user=username&password=pass\") \\\n",
    "  .option(\"dbtable\", \"my_table_copy\") \\\n",
    "  .option(\"tempdir\", \"s3n://path/for/temp/data\") \\\n",
    "  .mode(\"error\") \\\n",
    "  .save()\n",
    "\n",
    "# Use IAM role-based authentication\n",
    "dfRS.write \\\n",
    "  .format(\"com.databricks.spark.redshift\") \\\n",
    "  .option(\"url\", \"jdbc:redshift://redshifthost:5439/database?user=username&password=pass\") \\\n",
    "  .option(\"dbtable\", \"my_table_copy\") \\\n",
    "  .option(\"tempdir\", \"s3n://path/for/temp/data\") \\\n",
    "  .option(\"aws_iam_role\", \"arn:aws:iam::123456789000:role/redshift_iam_role\") \\\n",
    "  .mode(\"error\") \\\n",
    "  .save()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"spark-jdbc-cleanup\"></a>\n",
    "## Cleanup\n",
    "\n",
    "Prior to exiting, release disk space, computation, and memory resources consumed by the active session:\n",
    "\n",
    "- [Delete Data](#spark-jdbc-delete-data)\n",
    "- [Release Spark Resources](#spark-jdbc-release-spark-resources)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"spark-jdbc-delete-data\"></a>\n",
    "### Delete Data\n",
    "\n",
    "You can optionally delete any of the directories or files that you created.\n",
    "See the instructions in the [Creating and Deleting Container Directories](https://www.iguazio.com/docs/latest-release/tutorials/getting-started/containers/#create-delete-container-dirs) tutorial.\n",
    "For example, the following code uses a local file-system command to delete a **&lt;running user&gt;/examples/spark-jdbc** directory in the \"users\" container.\n",
    "Edit the path, as needed, then remove the comment mark (`#`) and run the code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "# !rm -rf /User/examples/spark-jdbc/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<a id=\"spark-jdbc-release-spark-resources\"></a>\n",
    "### Release Spark Resources\n",
    "\n",
    "When you're done, run the following command to stop your Spark session and release its computation and memory resources:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "spark.stop()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
